*TITLE*

*Exploratory Data Analysis and Accident Reduction Strategies for Road Accidents in India*

*AUTHOR*

*S Pooja*

*📋 PROJECT SUMMARY:*

*This project focuses on analyzing accident data from India through Exploratory Data Analysis (EDA) techniques. The aim is to identify key patterns related to accident severity, states, time of day, road and weather conditions, alcohol involvement, and driver demographics. Based on the findings, strategic recommendations are proposed to reduce accidents, especially focusing on poor road conditions and under-construction zones.*

*Source Information:*

*The dataset includes 3,000 accident records spanning from 2018 to 2023, with detailed attributes such as accident severity, weather conditions, road type, vehicle involvement, casualties, and more sourced from Kaggle's India Road Accident Dataset Predictive Analysis dataset by Khushi Yadav.*

In [4]:
#Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
In [6]:
# Load the Data
df = pd.read_csv('accident_prediction_india.csv') 
df
Out[6]:
State Name City Name Year Month Day of Week Time of Day Accident Severity Number of Vehicles Involved Vehicle Type Involved Number of Casualties ... Road Type Road Condition Lighting Conditions Traffic Control Presence Speed Limit (km/h) Driver Age Driver Gender Driver License Status Alcohol Involvement Accident Location Details
0 Jammu and Kashmir Unknown 2021 May Monday 1:46 Serious 5 Cycle 0 ... National Highway Wet Dark Signs 61 66 Male NaN Yes Curve
1 Uttar Pradesh Lucknow 2018 January Wednesday 21:30 Minor 5 Truck 5 ... Urban Road Dry Dusk Signs 92 60 Male NaN Yes Straight Road
2 Chhattisgarh Unknown 2023 May Wednesday 5:37 Minor 5 Pedestrian 6 ... National Highway Under Construction Dawn Signs 120 26 Female NaN No Bridge
3 Uttar Pradesh Lucknow 2020 June Saturday 0:31 Minor 3 Bus 10 ... State Highway Dry Dark Signals 76 34 Female Valid Yes Straight Road
4 Sikkim Unknown 2021 August Thursday 11:21 Minor 5 Cycle 7 ... Urban Road Wet Dusk Signs 115 30 Male NaN No Intersection
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2995 Tamil Nadu Chennai 2021 January Sunday 1:15 Minor 5 Truck 4 ... National Highway Wet Dark Signs 74 43 Male Expired Yes Intersection
2996 Uttarakhand Unknown 2018 July Sunday 10:12 Fatal 3 Car 3 ... Urban Road Under Construction Daylight NaN 86 23 Female NaN Yes Intersection
2997 Meghalaya Unknown 2021 January Thursday 19:34 Minor 2 Two-Wheeler 8 ... National Highway Dry Dark Signs 47 57 Female Valid Yes Intersection
2998 Meghalaya Unknown 2023 June Sunday 20:54 Fatal 1 Cycle 9 ... Urban Road Under Construction Daylight Signs 60 28 Female Expired Yes Bridge
2999 Arunachal Pradesh Unknown 2020 September Monday 7:19 Fatal 5 Cycle 1 ... National Highway Under Construction Daylight NaN 40 66 Male NaN Yes Bridge

3000 rows × 22 columns

In [8]:
#View First Few Rows (Head)
df.head()
Out[8]:
State Name City Name Year Month Day of Week Time of Day Accident Severity Number of Vehicles Involved Vehicle Type Involved Number of Casualties ... Road Type Road Condition Lighting Conditions Traffic Control Presence Speed Limit (km/h) Driver Age Driver Gender Driver License Status Alcohol Involvement Accident Location Details
0 Jammu and Kashmir Unknown 2021 May Monday 1:46 Serious 5 Cycle 0 ... National Highway Wet Dark Signs 61 66 Male NaN Yes Curve
1 Uttar Pradesh Lucknow 2018 January Wednesday 21:30 Minor 5 Truck 5 ... Urban Road Dry Dusk Signs 92 60 Male NaN Yes Straight Road
2 Chhattisgarh Unknown 2023 May Wednesday 5:37 Minor 5 Pedestrian 6 ... National Highway Under Construction Dawn Signs 120 26 Female NaN No Bridge
3 Uttar Pradesh Lucknow 2020 June Saturday 0:31 Minor 3 Bus 10 ... State Highway Dry Dark Signals 76 34 Female Valid Yes Straight Road
4 Sikkim Unknown 2021 August Thursday 11:21 Minor 5 Cycle 7 ... Urban Road Wet Dusk Signs 115 30 Male NaN No Intersection

5 rows × 22 columns

In [10]:
#View Last Few Rows (Tail)
df.tail()
Out[10]:
State Name City Name Year Month Day of Week Time of Day Accident Severity Number of Vehicles Involved Vehicle Type Involved Number of Casualties ... Road Type Road Condition Lighting Conditions Traffic Control Presence Speed Limit (km/h) Driver Age Driver Gender Driver License Status Alcohol Involvement Accident Location Details
2995 Tamil Nadu Chennai 2021 January Sunday 1:15 Minor 5 Truck 4 ... National Highway Wet Dark Signs 74 43 Male Expired Yes Intersection
2996 Uttarakhand Unknown 2018 July Sunday 10:12 Fatal 3 Car 3 ... Urban Road Under Construction Daylight NaN 86 23 Female NaN Yes Intersection
2997 Meghalaya Unknown 2021 January Thursday 19:34 Minor 2 Two-Wheeler 8 ... National Highway Dry Dark Signs 47 57 Female Valid Yes Intersection
2998 Meghalaya Unknown 2023 June Sunday 20:54 Fatal 1 Cycle 9 ... Urban Road Under Construction Daylight Signs 60 28 Female Expired Yes Bridge
2999 Arunachal Pradesh Unknown 2020 September Monday 7:19 Fatal 5 Cycle 1 ... National Highway Under Construction Daylight NaN 40 66 Male NaN Yes Bridge

5 rows × 22 columns

In [12]:
#Check Data Shape
df.shape
Out[12]:
(3000, 22)
In [14]:
#Check Data Types (dtypes)
df.dtypes
Out[14]:
State Name                     object
City Name                      object
Year                            int64
Month                          object
Day of Week                    object
Time of Day                    object
Accident Severity              object
Number of Vehicles Involved     int64
Vehicle Type Involved          object
Number of Casualties            int64
Number of Fatalities            int64
Weather Conditions             object
Road Type                      object
Road Condition                 object
Lighting Conditions            object
Traffic Control Presence       object
Speed Limit (km/h)              int64
Driver Age                      int64
Driver Gender                  object
Driver License Status          object
Alcohol Involvement            object
Accident Location Details      object
dtype: object
In [153]:
#duplicate value
df.duplicated().sum()
Out[153]:
0
In [16]:
#Check for Missing Values
df.isnull().sum()
Out[16]:
State Name                       0
City Name                        0
Year                             0
Month                            0
Day of Week                      0
Time of Day                      0
Accident Severity                0
Number of Vehicles Involved      0
Vehicle Type Involved            0
Number of Casualties             0
Number of Fatalities             0
Weather Conditions               0
Road Type                        0
Road Condition                   0
Lighting Conditions              0
Traffic Control Presence       716
Speed Limit (km/h)               0
Driver Age                       0
Driver Gender                    0
Driver License Status          975
Alcohol Involvement              0
Accident Location Details        0
dtype: int64
In [18]:
plt.figure(figsize=(16,6))
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
Out[18]:
<Axes: >
No description has been provided for this image
In [22]:
#replace missing values with "Unknown"
df['Traffic Control Presence'].fillna('Unknown', inplace=True)
In [24]:
df['Driver License Status'].fillna('Unknown', inplace=True)
In [26]:
plt.figure(figsize=(16,6))
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
Out[26]:
<Axes: >
No description has been provided for this image
In [28]:
#Statistical Summary (Describe)
df.describe()
Out[28]:
Year Number of Vehicles Involved Number of Casualties Number of Fatalities Speed Limit (km/h) Driver Age
count 3000.000000 3000.000000 3000.000000 3000.000000 3000.000000 3000.00000
mean 2020.530000 2.996000 5.066000 2.455333 74.940667 44.17700
std 1.683858 1.428285 3.214097 1.717650 26.765088 15.40286
min 2018.000000 1.000000 0.000000 0.000000 30.000000 18.00000
25% 2019.000000 2.000000 2.000000 1.000000 51.000000 31.00000
50% 2021.000000 3.000000 5.000000 2.000000 75.000000 45.00000
75% 2022.000000 4.000000 8.000000 4.000000 99.000000 57.00000
max 2023.000000 5.000000 10.000000 5.000000 120.000000 70.00000
In [30]:
df.describe(include='object')
Out[30]:
State Name City Name Month Day of Week Time of Day Accident Severity Vehicle Type Involved Weather Conditions Road Type Road Condition Lighting Conditions Traffic Control Presence Driver Gender Driver License Status Alcohol Involvement Accident Location Details
count 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000
unique 32 28 12 7 1263 3 7 5 4 4 4 4 2 3 2 4
top Goa Unknown March Wednesday 8:34 Minor Truck Rainy State Highway Under Construction Dark Signs Female Valid Yes Intersection
freq 109 2138 266 468 7 1034 449 631 771 778 763 812 1563 1057 1520 789
In [34]:
#Unique Value Counts for Categorical Columns
categorical_columns = df.select_dtypes(include=['object'])  
unique_counts = categorical_columns.nunique() 
print(unique_counts)
State Name                     32
City Name                      28
Month                          12
Day of Week                     7
Time of Day                  1263
Accident Severity               3
Vehicle Type Involved           7
Weather Conditions              5
Road Type                       4
Road Condition                  4
Lighting Conditions             4
Traffic Control Presence        4
Driver Gender                   2
Driver License Status           3
Alcohol Involvement             2
Accident Location Details       4
dtype: int64

Time of Day has a very high number of unique values (exact accident time) — maybe better to group by hour later.

📊 Visual Exploratory Data Analysis (EDA)

Accident Severity Distribution

In [45]:
fig = px.pie(df, names='Accident Severity', title='Accident Severity Distribution')
fig.show()

Accidents by State

In [53]:
df_counts = df['State Name'].value_counts()
fig = px.bar(x=df_counts.index, y=df_counts.values,
             labels={'x': 'State', 'y': 'Accidents'},
             title='Accidents by State')
fig.show()

Accidents by Weather Condition

In [64]:
import plotly.express as px

df_counts = df['Weather Conditions'].value_counts().reset_index()
df_counts.columns = ['Weather', 'Accidents']  

fig = px.bar(df_counts, x='Weather', y='Accidents',
             labels={'Weather': 'Weather', 'Accidents': 'Accidents'},
             title='Weather Conditions in Accidents',
             color='Weather') 
fig.show()

Accidents by Road Condition

In [68]:
df_counts = df['Road Condition'].value_counts().reset_index()
df_counts.columns = ['Road Condition', 'Accidents'] 

fig = px.bar(
    df_counts,
    x='Road Condition',
    y='Accidents',
    labels={'Road Condition': 'Road Condition', 'Accidents': 'Accidents'},
    title='Road Condition During Accidents',
    color='Road Condition' 
)

fig.show()

Alcohol Involvement in Accidents

In [74]:
df_counts = df['Alcohol Involvement'].value_counts().reset_index()
df_counts.columns = ['Alcohol Involvement', 'Count']  

fig = px.bar(
    df_counts,
    x='Alcohol Involvement',
    y='Count',
    labels={'Alcohol Involvement': 'Alcohol Involvement', 'Count': 'Count'},
    title='Alcohol Involvement in Accidents',
    color='Alcohol Involvement' 
)

fig.show()

Accidents by Hour of Day

In [80]:
df['Hour'] = pd.to_datetime(df['Time of Day'], format='%H:%M', errors='coerce').dt.hour
In [82]:
import plotly.express as px

fig = px.histogram(df, x='Hour', nbins=24, title='Accidents by Hour of Day')
fig.update_traces(marker=dict(line=dict(color='black', width=1)))
fig.show()

Driver's Age Distribution

In [87]:
fig = px.histogram(df, x='Driver Age', nbins=30, title="Driver's Age Distribution")
fig.update_traces(marker=dict(line=dict(color='black', width=1)))
fig.show()

Correlation Heatmap between Numerical Features

In [127]:
plt.figure(figsize=(7,5))
corr = df[['Number of Vehicles Involved', 'Number of Casualties', 'Number of Fatalities', 'Speed Limit (km/h)', 'Driver Age']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap between numerical Features')
plt.show()
No description has been provided for this image

*Summary of Key Insights*

*1.Accident Severity Distribution:*

Minor accidents are most common. Serious and fatal accidents form a significant part, needing targeted actions.

*2.High Accident States:*

Goa, Delhi,Sikkim and a few others show the highest accident counts.

*3.Weather Conditions:*

Most accidents happen in Rainy weather. Stormy and Hazy conditions also significantly contribute.

*4.Road Conditions:*

Majority of accidents occur on under-construction road, but wet,dry roads and damaged roads are risky too.

*5.Alcohol Involvement:*

A considerable number of accidents involve alcohol. Strong need for anti-drunk-driving measures.

*6.Time of Day:*

More accidents happen around 3 AM, 5 AM to 8 AM, 9 PM to 10 PM Accidents increase during early mornings and late evenings due to fatigue, poor visibility, alcohol influence, overspeeding, and reduced alertness.

*7.Driver Demographics:*

Most drivers involved are aged between 18 to 19, 44 to 45 years. Targeted awareness campaigns needed for this group.

*8.Correlations:*

Number of Fatalities is positively correlated with Number of Casualties. No strong correlation with speed limit or driver age directly.

*Recommendations for Reducing Road Accidents in India*

*1. Stricter Law Enforcement in High-Accident States:*

Focused deployment of traffic police and automated surveillance (e.g., speed cameras, red-light cameras) in states with the highest accident rates.

Increase fines and implement stricter penalties for violations such as overspeeding, rash driving, and failure to wear seatbelts/helmets.

Conduct regular road safety audits and enforce corrective measures immediately.

*2.Nighttime Lighting Improvements and Monitoring:*

Install better street lighting on highways, rural roads, and accident-prone urban intersections to improve night visibility.

Use smart lighting systems that adjust brightness based on weather and traffic conditions.

Increase night-time patrolling and set up sobriety checkpoints to catch drowsy or intoxicated drivers.

*3. Anti-Alcohol Driving Campaigns:*

Launch nationwide awareness campaigns highlighting the dangers of drinking and driving, especially targeting festive seasons and weekends.

Implement stricter blood alcohol concentration (BAC) limits and conduct frequent random breathalyzer tests.

Collaborate with bars, restaurants, and event organizers to promote designated driver programs and encourage safe alternatives like ride-sharing.

*4. Weather-Related Driving Alerts:*

Develop real-time weather advisory systems integrated into GPS apps and highway signboards to warn drivers about fog, rain, or poor road conditions.

Enforce speed limit reductions during adverse weather and provide designated safe parking zones during extreme weather events.

Educate drivers on safe driving techniques in different weather conditions through licensing programs and public service announcements.

*5.Clear Advance Warning:*

Install large, highly visible warning signs several hundred meters before the construction site.

Use flashing lights, reflective signs, and electronic boards (especially at night or in low-visibility conditions).

*6. Training Programs for Younger and Mid-Aged Drivers:*

Introduce mandatory defensive driving courses for young drivers (under 25) and mid-aged drivers (30–45), who statistically show higher accident involvement.

Regular refresher courses for commercial drivers and fleet operators.

Incentivize participation through discounts on insurance premiums or tax benefits for individuals completing certified safety training.

*Conclusion*

Through data-driven insights, the analysis identifies major factors contributing to road accidents in India. The project recommends a combination of infrastructure improvements, stricter law enforcement, driver education, and real-time hazard management to significantly reduce accident rates and save lives.